Effects of language mixing for automatic recognition of Cantonese-English code-mixing utterances
نویسندگان
چکیده
While automatic speech recognition of either Cantonese or English alone has achieved a great degree of success, recognition of Canton-English code-mixing speech is not as trivial. This paper attempts to analyze the effect of language mixing on recognition performance of code-mixing utterances. By examining the recognition results of Canton-English code-mixing speech, where Canton is the matrix language and English is the embedded language, we noticed that recognition accuracy of the embedded language plays a significant role to the overall performance. In particular, significant performance degradation is found in the matrix language if the embedded words can not be recognized correctly. We also studied the error propagation effect of the embedded English. The results show that the error in embedded English words may propagate to two neighboring Cantonese syllables. Finally, analysis is carried out to determine the influencing factors for recognition performance in embedded English.
منابع مشابه
Automatic Recognition of Cantonese-English Code-Mixing Speech
Code-mixing is a common phenomenon in bilingual societies. It refers to the intra-sentential switching of two different languages in a spoken utterance. This paper presents the first study on automatic recognition of Cantonese-English code-mixing speech, which is common in Hong Kong. This study starts with the design and compilation of code-mixing speech and text corpora. The problems of acoust...
متن کاملAutomatic speech recognition of Cantones
This paper describes our recent work on the development of a largevocabulary, speaker-independent, continuous speech recognition system for Cantonese-English code-mixing utterances. The details of both acoustic modeling and language modeling will be discussed. For acoustic modeling, Cantonese accents in English words are handled by applying cross-lingual acoustic units, as well as modifications...
متن کاملDevelopment of a Cantonese-English code-mixing speech corpus
This paper describes the design and compilation of the CUMIX Cantonese-English code-mixing speech corpus. Code-mixing is a common phenomenon in many bilingual societies and it usually involves at least two different languages within one utterance. In Hong Kong, people usually mix English words and phrases with Cantonese in their daily conversation. Although there are many monolingual corpora of...
متن کاملMainland Chinese Students’ Shifting Perceptions of Chinese-English Code-Mixing in Macao
As a former Portuguese colony, Macao is the only region in China where Cantonese, a variety of Chinese, and English, an international language, are enjoying de facto official statuses, with Putonghua being a quasi-official language and Portuguese being another official language. Recently, with an increasing number of Mainland Chinese students crossing the border to pursue their tertiar...
متن کاملCode-Mixing and Mixed Verbs in Cantonese-English Bilingual Children: Input and Innovation
In both child and adult Cantonese, code-mixing is used productively. We focus on the insertion of English verbs into Cantonese utterances. Data from nine simultaneous bilingual children in the Hong Kong Bilingual Child Language Corpus are analyzed. Case studies show that the children’s rates of mixing closely match the rate of mixing in the parental input, and that different input conditions in...
متن کامل